Persistent Homology for the Evaluation of Dimensionality Reduction Schemes
نویسندگان
چکیده
High-dimensional data sets are a prevalent occurrence in many application domains. This data is commonly visualized using dimensionality reduction (DR) methods. DR methods provide e.g. a two-dimensional embedding of the abstract data that retains relevant high-dimensional characteristics such as local distances between data points. Since the amount of DR algorithms from which users may choose is steadily increasing, assessing their quality becomes more and more important. We present a novel technique to quantify and compare the quality of DR algorithms that is based on persistent homology. An inherent beneficial property of persistent homology is its robustness against noise which makes it well suited for real world data. Our pipeline informs about the best DR technique for a given data set and chosen metric (e.g. preservation of local distances), and provides information about the local quality of an embedding that help users understand the shortcomings of the selected DR method. The utility of our method is demonstrated using application data from multiple domains and a variety of commonly used DR methods.
منابع مشابه
Persistent homology for low-complexity models
We show that recent results on randomized dimension reduction schemes that exploit structural properties of data can be applied in the context of persistent homology. In the spirit of compressed sensing, the dimension reduction is determined by the Gaussian width of a structure associated to the data set, rather than its size, and such a reduction can be computed efficiently. We further relate ...
متن کاملImpact of linear dimensionality reduction methods on the performance of anomaly detection algorithms in hyperspectral images
Anomaly Detection (AD) has recently become an important application of hyperspectral images analysis. The goal of these algorithms is to find the objects in the image scene which are anomalous in comparison to their surrounding background. One way to improve the performance and runtime of these algorithms is to use Dimensionality Reduction (DR) techniques. This paper evaluates the effect of thr...
متن کامل2D Dimensionality Reduction Methods without Loss
In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...
متن کاملComparing Dimensionality Reduction Methods Using Data Descriptor Landscapes
Dimensionality reduction (DR) methods are commonly used in data science to turn high-dimensional data into 2D representations. Since data sets contain different structural features that need to be preserved by this process, there is a multitude of DR methods, each geared towards preserving a separate aspect. This makes choosing a suitable algorithm for a given data set a challenging task. In th...
متن کاملPersistent Homology Analysis of RNA
Topological data analysis hasbeen recentlyused to extractmeaningful information frombiomolecules. Here we introduce the application of persistent homology, a topological data analysis tool, for computing persistent features (loops) of the RNA folding space. The scaffold of the RNA folding space is a complex graph from which the global features are extracted by completing the graph to a simplici...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. Graph. Forum
دوره 34 شماره
صفحات -
تاریخ انتشار 2015